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Abstract  —  This  paper  considers  the  problem  of 
developing  algorithms  for  the  distributed  fusion  of 
Gaussian  Mixture  Models  through  the  use  of  Cher¬ 
noff  information.  We  derive  a  first  order  approxi¬ 
mation  and  show  that,  in  a  distributed  tracking  prob¬ 
lem  in  which  sensor  nodes  are  equipped  with  only 
range-only  or  bearing-only  sensors,  it  yields  consis¬ 
tent  estimates. 
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1  Introduction 

In  many  estimation  problems,  the  assumption  that 
random  variables  are  independent,  or  that  the  correla¬ 
tions  between  them  are  known,  is  not  true.  Errors  in 
the  process  and  observation  models,  for  example,  lead 
to  correlated  process  and  observation  noises  [1].  Even 
if  the  models  are  known  perfectly,  computational  and 
storage  requirements  often  mean  that  the  full  corre¬ 
lation  information  cannot  be  maintained.  One  impor¬ 
tant  class  where  this  arises  is  in  distributed  data  fusion 
(DDF). 

DDF  networks  are  composed  of  a  set  networked 
set  of  nodes.  Nodes  can  fuse  data  acquired  locally 
(from  sensors)  and  remotely  (from  information  prop¬ 
agated  from  other  nodes).  Because  estimates  rather 
than  raw  sensor  data  are  propagated,  the  problem  of 
double  counting  has  to  be  avoided  by  factoring  out 
common  information  [2].  When  the  network  is  known 
to  lie  in  a  tree-connected  topology,  a  single  path  ex¬ 
ists  between  any  pair  of  nodes.  This  fact  can  be  ex¬ 
ploited  to  calculate  the  mutual  information  between 
the  nodes  using  channel  filters  [3].  However,  there  are 
two  important  limitations  with  this  approach.  First, 
when  the  connection  topology  is  arbitrary,  channel  fil¬ 
ters  cannot  be  used  and,  in  fact,  no  local  solution  can 
be  applied  [4].  Second,  when  the  estimates  are  not 
Gaussian,  the  factoring  process  used  in  a  channel  fil¬ 
ter  does  not  appear  to  have  a  closed  form  solution  and 
computationally  expensive  numerical  methods  must  be 
used  instead  [5].  An  alternative  approach,  known  as 
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Covariance  Intersection  (Cl),  was  proposed  in  [6,  7]. 
Given  a  set  of  estimates  which  are  described  by  their 
means  and  covariances,  Cl  provides  a  mechanism  for 
fusing  them  together  such  that  the  estimate  remains 
consistent.  Although  Cl  provides  a  very  powerful  and 
general  method  for  fusing  data  in  arbitrary  networks, 
it  only  utilizes  the  mean  and  covariance  of  the  esti¬ 
mates  and  cannot  exploit  any  additional  information 
about  the  probability  distribution  of  the  estimates. 

Although  the  mean  and  covariance  representation 
has  been  successfully  used  in  many  tracking  systems, 
it  has  a  number  of  limitations  that  can  be  encountered 
in  a  variety  of  practical  contexts.  For  example,  a  single 
large  mean  and  covariance  is  a  poor  representation  of 
the  uncertainty  associated  with  a  range-only  sensor  [8] 
or  a  bearing-only  sensor  [9] .  Furthermore,  multiple  hy¬ 
pothesis  tracking  is  poorly  represented  by  a  single  large 
covariance.  Therefore,  we  seek  methods  to  generalize 
Cl  to  exploit  more  information  than  a  mean  and  co- 
variance  representation. 

In  this  paper  we  develop  an  algorithm  to  extend 
Cl  to  Gaussian  Mixture  Models  (GMMs).  Our  algo¬ 
rithm  is  based  on  a  first  order  approximation  to  the 
Chernoff  Information.  We  describe  our  approach  as 
empirical  for  two  reasons.  First,  the  only  justification 
we  have  for  using  Chernoff  Information  as  the  basis  of 
a  generalization  of  Cl  are  based  on  the  observations  by 
Mahler  [10]  and  Hurley  [11]  (discussed  in  more  detail 
below).  Second,  to  develop  a  closed  form  solution,  we 
use  first  order  approximations  of  the  Chernoff  Infor¬ 
mation  for  GMMs  and  a  simplified  cost  function  for 
the  optimization  process.  These  approximations  intro¬ 
duce  their  own  sources  of  error  and  lead  to  extremely 
complicated  error  analysis.  Rather  than  attempt  to 
theoretically  prove  the  properties  of  the  algorithm,  we 
demonstrate  its  performance  on  a  distributed  tracking 
application.  Our  results  show  that,  despite  these  ap¬ 
proximations,  the  algorithm  is  both  consistent  (in  a 
mean  squared  error  sense)  and  outperforms  the  only 
other  algorithm  we  are  aware  of  that  tries  to  extend 
Cl  to  GMMs  [12], 

The  structure  of  this  paper  is  as  follows.  Section  2 
describes  the  distributed  data  fusion  problem  and  the 
Chernoff  Information  solution.  Our  closed  form  ap¬ 
proximation  to  Chernoff  Information  is  developed  in 
Section  3  and  its  properties  are  analysed.  The  perfor- 
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mance  of  the  algorithm  is  illustrated  in  a  distributed 
tracking  problem  in  Section  4  and  conclusions  are 
drawn  in  Section  5. 


2  Problem  Statement 

The  problem  of  fusing  data  from  two  sources  can  be 
posed  using  Bayes  Rule.  Mathematically,  this  can  be 
written  as 


P  (xfc|Zfc)  oc 


P  (xfc|Zfc)  P  (xfc|Z|) 

p(xfc|z«nz|) 


where  P  (xfc|Z£)  is  the  probability  distribution  at  node 
a ,  P  (xfc|Z^j  is  that  for  b  and  P  (xfc|Z£  P|  Z^)  is  the 
common  information  between  the  two  nodes.  In  an 
arbitrary  network,  where  multiple  paths  can  exist  be¬ 
tween  a  and  6,  P  (xfc|Z£  f)  Zb.)  cannot  be  calculated 
using  local  information,  and  the  entire  network  must 
be  considered.  Given  that  the  network  can  scale  to 
thousands  or  millions  of  nodes,  this  erodes  many  of 
the  advantages  of  distributed  data  fusion  networks  in¬ 
cluding  their  scalability,  robustness  and  flexibility. 

To  overcome  these  limitations,  a  number  of  au¬ 
thors  have  attempted  to  develop  methods  that  avoid 
the  need  to  calculate  the  properties  of  the  entire  net¬ 
work.  Mutamabara  [13]  and  Berg  [14],  for  example, 
developed  methods  in  which  only  subsets  of  state  in¬ 
formation  need  to  be  distributed  to  subsets  of  nodes. 
Grime  [3]  developed  algorithms  for  tree  connected 
structures  using  channel  filters  considering  mean  and 
covariance  representations.  More  generally,  Chong  and 
Mori  [15]  used  graph  theory  to  identify  conditionally 
independent  information  within  state  estimates  that  is 
guaranteed  to  be  independent  and  can  be  distributed 
amongst  nodes.  However,  all  of  these  solutions  rely 
on  specific  assumptions  about  the  network  topology 
(it  is  tree-connected)  and/or  the  structure  of  the  state 
space  (such  that  conditionally  independent  nodes  can 
be  identified).  However,  neither  condition  holds  true 
for  a  general  adhoc  network  with  arbitrary,  time  vary¬ 
ing  system  models. 

In  [6,  7],  a  data  fusion  algorithm  called  Covariance 
Intersection  (Cl)  was  presented.  Suppose  that  the 
means  and  covariances  of  P  (xfc|Z£)  and  P  (x;,, |Z/)  are 
(a,  A)  and  (b,B)  respectively.  Let  the  mean  and  co- 
variance  of  the  update  be  (c,  C).  Let  a  and  b  be  the 
errors  in  the  estimates.  If  the  estimates  are  consistent 
in  the  sense 


A  —  E  aaT  >  0 


B  -E 


>  0 


(1) 


where  >  0  denotes  positive  semidefinite,  the  Cl  update 
rule  is 


C"1  =wA'1  +  (l-w)B-1 

c  =  C  (ojA~1Sl  +  (1  —  w)B_1b  )  . 


This  update  is  guaranteed  to  be  consistent  in  the  sense 

C  -  E  [ccT]  >  0  (2) 

for  u>  €  [0, 1].  The  Cl  equations  are  equivalent  to  the 
Kalman  filter  equations  with  A  replaced  by  A/t o  and 

B/(l-w). 

Therefore,  given  consistent  estimates,  Cl  can  yield 
consistent  updates.  However,  the  Cl  equations  only 
use  a  linear  update  rule  and  can  only  utilize  the  first 
two  moments  of  the  state  estimate.  In  many  problems 
these  representations  are  extremely  crude  and  there 
is  a  strong  incentive  to  consider  how  Cl  could  be  ex¬ 
tended  to  utilize  additional  distribution  information 
when  it  is  available. 

The  first  author  to  consider  this  issue  was 
Mahler  [10]  who  observed  the  following.  Suppose  Pa(x) 
is  a  Gaussian  distributed  random  with  the  pdf 

Pa(x)  =  A/'jxja,  A}  . 

Raising  it  to  a  power  lo  and  renormalizing  gives 
Pa  M  =  A/"{x;a,  A/w}. 


In  other  words,  the  distribution  is  still  a  Gaussian 
with  the  same  mean  but  the  covariance  has  been  scaled 
to  A/tu.  Similarly,  calculating  and  renormalizing 
p(i—u)  (Xfc|z/)  leads  to  a  Gaussian-distributed  ran¬ 
dom  variable  with  mean  b  and  covariance  B/(l  —  lo). 
Since  Cl  resembles  the  KF  with  scaled  covariance  ma¬ 
trices,  and  since  the  KF  is  an  application  of  Bayes’ 
Rule  with  Gaussian  distributed  random  variables,  he 
extrapolated  this  observation  to  all  distributions  to 
give  the  expression 


P“(x)Pb1~"(x) 

/P“(x)Pb1-(x)dx' 


(3) 


For  0  <  lo  <  1,  one  heuristic  interpretation  is  that  this 
tends  to  “flatten”  the  distribution.  Because  it  becomes 
more  uniform  in  nature,  it  becomes  a  more  “conserva¬ 
tive”  estimate. 

Mahler  proposed  choosing  a  value  of  lo  to  maximize 
the  “peakiness”  of  the  distribution, 


(w,x)  =  arg  sup  Pu  (x) 

LJ,X 


However,  no  actual  studies  were  provided  to  show  that 
this  is  a  truly  robust  result  and  not  simply  the  coinci¬ 
dence  of  the  properties  of  Gaussian  distributions. 

Hurley  independently  made  the  same  observations 
about  Gaussians  but  noted  that  (3)  is  an  equation  used 
to  calculate  the  Chernoff  Information  of  a  pair  of  dis¬ 
tributions  [11].  Chernoff  Information  quantifies  the 
best  achievable  exponent  in  the  Bayesian  probability 
of  error.  It  arises  when  constructing  decision  regions 
to  minimize  the  probability  of  error  and  is  extensively 
used  in  distributed  target  identification  to  determine 
the  best  achievable  performance.  Unlike  Mahler’s  ap¬ 
proach,  the  Chernoff  Information  is  calculated  across 
the  entire  distribution  and  is  given  by 

C(Pi,  P2)  =  -  omini  log  ^  P^ (x)Pb1““(x)dx^  (4) 


The  optimal  value  of  u>,  u *,  has  the  property  that 

D*  =  D  ^.(x)||Pa(x)^  =  D  (Pw*(x)|| n(x)^ 

where  D(-||-)  is  the  Kullback-Leibler  divergence.  In 
other  words,  the  Chernoff  Information  is  equally  dis¬ 
tance  from  both  of  the  prior  distributions1. 

Because  Chernoff  Information  is  a  general  result 
which  applies  to  all  probability  distributions,  it  sug¬ 
gests  that  its  relationship  with  Cl  is  not  a  mere  co¬ 
incidence  of  the  Gaussian  form  and  thus  might  have 
general  applicability.  However,  to  our  knowledge  few 
authors  have  attempted  to  develop  distributed  data  fu¬ 
sion  algorithms  using  Chernoff  Information.  The  only 
paper  we  are  aware  of  which  uses  this  approach  is  a 
study  by  Hwang  [17]  who  compared  several  different 
approaches  to  distributed  hypothesis  testing  for  tar¬ 
get  identification.  His  study  looked  at  the  effects  of 
sensors  which  misclassified  data.  His  results  suggested 
that  Chernoff  Information  was  worse  in  the  sense  that 
the  estimate  was  more  heavily  affected  by  the  incorrect 
sensors.  However,  these  results  can  also  be  interpreted 
in  a  positive  light  —  the  Chernoff  Information  filter 
was  more  susceptible  to  the  information  because  it  did 
not  converge  as  tightly  to  a  single  identification  hy¬ 
potheses. 

These  theoretical  studies  and  empirical  results  sug¬ 
gest  that  Chernoff  Information  might  offer  a  strategy 
for  distributed  data  fusion.  We  now  consider  the  appli¬ 
cation  of  Chernoff  Information  to  an  important  class 
of  probability  distributions  -  the  Gaussian  Mixture 
Models. 


3  Approximate  Chernoff  Infor¬ 
mation  for  Gaussian  Mixture 
Models 

A  Gaussian  Mixture  Model  (GMM)  is  a  probability 
distribution  function  which  can  be  written  as  the  sum 
of  a  set  of  weighted  Gaussian  kernels.  Therefore, 

JV„ 

Pa  (x)  =  ^  piU  {x;  ai ,  A* }  , 

i=1  (5) 

A(x)  =  ^</,A'ix:b,.B,}  . 

We  seek  a  closed  form  analytical  approximation  of  the 
Chernoff  solution  such  that 


There  are  three  reasons  why  we  consider  pdfs  of  this 
form.  First,  GMMs  are  a  very  natural  extension  of  the 
mean  and  covariance  representation.  Second,  the  fam¬ 
ily  of  GMMs  is,  in  principle,  extremely  general  and  al¬ 
most  any  pdf  can  be  expressed  precisely  using  GMMs. 
Furthermore,  many  distributions  can  be  approximated 
well  by  a  small  number  of  terms  of  a  GMM.  Therefore, 
it  is  an  extremely  important  practical  distribution.  Fi¬ 
nally,  GMMs  share  a  strong  theoretical  relationship 
with  Multiple  Hypothesis  Tracking  (MHT).  As  we  dis¬ 
cuss  in  the  conclusions,  MHT  might  offer  a  mechanism 
for  generalizing  these  results  even  when  the  mixtures 
are  not  Gaussian  distributed. 

Upcroft  proposed  a  form  of  Cl  for  GMMs  which 
we  term  the  Pairwise  Component  Cl  (PCCI)  fusion 
rule  [12].  Given  the  two  distributions  P0(x)  and  P&(x), 
Cl  is  applied  to  each  pair  of  estimates  in  turn.  Let  u>ij 
be  the  weight  applied  to  the  fused  estimate  from  the  ith 
component  of  P0(x)  and  the  jth  component  of  P&(x). 
Then  Nc  =  NaNt,  and  the  ij th  component  is  given  by 

e,;  -  WA, 1  •  !1  ~wc'V 

c.J  =  Ci,(^A-1ai  +  (l-^)B-1b; 

UijPi  +  (1  -  U>ij)  <1, 

Z^fc=l  Z_.i= 1  w klPk  +  (1  <JJkl)  Ql 

Heuristically,  this  form  is  motivated  by  the  fact  that 
when  u>ij  =  1  the  estimate  should  only  contain  the 
component  from  A  whereas  if  u>ij  =  0.0  then  the  esti¬ 
mate  should  only  contain  a  component  from  B.  Fur¬ 
thermore,  Cl  can  be  applied  to  each  component  inde¬ 
pendently.  However,  this  form  is  an  extremely  poor 
approximation  to  the  Chernoff  solution.  This  is  illus¬ 
trated  in  Figure  1.  The  figure  shows  contour  lines  of 
the  pdfs  of  two  input  estimates,  the  Chernoff  Informa¬ 
tion  solution  (calculated  numerically)  and  the  PCCI 
when  u>ij  is  chosen  to  minimize  the  determinant  of 
C ij.  As  can  be  seen,  the  Chernoff  Information  solu¬ 
tion  has  a  single  strong  mode  whereas  the  PCCI  main¬ 
tains  multiple  modes.  Furthermore,  the  PCCI  tends  to 
underweight  the  middle  mode  which  is  closest  to  the 
Chernoff  solution. 

This  can  be  quantified  using  a  metric  proposed  by 
Conraniciu  [18].  The  metric  quantifies  the  distance  be¬ 
tween  two  distributions  and  is  given  by 


d  = 


1  ~  P 


P(x),P(x) 


where 


Nc 

Pc(x)=^rlW{x;ci,Ct}.  (6) 

i= 1 


1This  is  distinct  from  O’Brien’s  Fusion  of  Correlated  Proba¬ 
bilities  (FCP)  algorithm  [16].  This  algorithm  used  the  expression 


Paa(x)Pf(x) 
f  Pa  M-Pjf  (x)dx 


No  conditions  were  placed  on  the  values  of  a  and  (3  and  thus  it 
could  be  interpreted  as  a  generalization  of  Chernoff  Information. 
However,  no  theoretical  analysis  has  been  provided  for  this  form. 


p(x),p(x) 


P(x)P(x)dx 


is  the  Bhattacharyya  Coefficient.  This  metric  has  the 
property  that  its  value  is  lies  between  0  and  1.  The 
results  for  this  example  are  shown  in  Table  1.  The 
metric  has  a  high  value  of  0.73. 

Given  this  deficiency,  we  seek  a  more  accurate  cal¬ 
culation  of  the  Chernoff  Information.  However,  (3) 
does  not  have  a  closed  form  solution  for  a  GMM.  Its 
value  could  be  approximated  numercally  using  a  grid. 


(c)  Chernoff 


(d)  PCCI 


(e)  Pseudo-Chernoff  1 


(f)  Pseudo-Chernoff  2 


covariances  are  scaled  up  by  a  factor  of  1/us  and  all 
weights  have  been  renormalized.  A  similar  expression 
is  used  to  calculate  pj;1  ^(x). 

Therefore,  the  update  rule  has  Nc  =  NaNb  compo¬ 
nents  and  can  be  written  as 


C^wAr'  +  fl-wJB  71 

c ij  =  C ij  (  wA-  +  (1  —  us)  Bj-  1bj 


PiQ- 


(1-u) 


Ta  = 


sr^Na  OJ  A 

Z^fc=i  2^i=i  Pk  h 


Nb 


(9) 


The  effect  of  this  approximation  is  illustrated  in  Fig¬ 
ure  1(e).  The  results  were  calculated  using  the  first 
order  expansion  approximation  and  the  value  of  us  cal¬ 
culated  by  the  numerical  Chernoff  solution.  As  can  be 
seen,  the  estimate  consists  of  a  large,  interconnected 
mass  whose  main  peak  lies  in  the  same  location  as  the 
Chernoff  solution.  Despite  the  fact  that  the  solution 
does  not  have  any  well-defined  modes,  Table  1  its  co¬ 
efficient  is  smaller  than  that  for  PCCI. 

The  second  difficulty  is  to  approximate  the  calcula¬ 
tion  of  us  such  that  (4)  is  satisfied.  This  could  be  calcu¬ 
lated  by  Monte  Carlo  integration  [20]  or  by  adapting 
the  distance  approximation  developed  by  Goldberger 
to  measure  the  dissimilarity  between  two  GMMs  [21] 2 . 
However,  this  introduces  its  own  approximations  and 
we  do  not  investigate  their  effects  here.  Rather,  we 
use  the  (somewhat  crude)  approach  of  minimizing  the 
covariance  of  the  entire  mixture  of  the  distribution, 


Figure  1:  Contour  plots  of  pdfs  for  different  fusion 
algorithms. 


Algorithm 

Cost 

PCCI 

Pseudo-Chernoff  1 
Pseudo-Chernoff  2 

0.7286 

0.6608 

0.6347 

Table  1:  The  costs  of  the  different  approximations. 

Although  the  sparse  methods  proposed  by  Bucy  and 
Senne  can  be  applied  [19]  to  reduce  the  computational 
costs,  the  curse  of  dimensionality  means  that,  in  gen¬ 
eral,  this  approach  is  prohibitively  expensive.  A  similar 
difficulty  arises  in  evaluating  (4). 

In  this  paper  we  use  two  approximations.  First,  to 
calculate  the  power  series  we  use  the  approximation 

(n  \  “  n 
*=  1  /  i= 1 

Substituting  into  (5), 

1  N 

Pa(*)  =  ^N - {x.-,&iiAi/u)}  .  (8) 

Ef=i 

In  other  words,  it  leads  to  an  N  component  GMM.  The 
means  of  each  component  remain  the  same  but  all  the 


Na  Nb 

c  =  rb  cb' 

*=i  j= i 

Na  Nb 

C  =  £E ri3  (C i:j  +  c ijcfj)  -  ccT.  (10) 

*=i  i=i 

The  results  of  the  Pseudo-Chernoff  algorithm  is 
shown  in  Figure  1(f).  This  distribution  possesses  two 
distinct  modes,  one  over  the  Chernoff  solution,  the 
other  offset  to  the  right.  The  cost  metric  shows  that 
the  approximation  has  a  smaller  cost  than  using  us  cal¬ 
culated  by  Chernoff  and  PCCI,  and  thus  shows  that  it 
is  a  more  accurate  approximation. 

We  now  show  the  effect  of  this  suboptimal  solution 
in  a  target  tracking  example. 

4  Example 

A  sensor  network,  consisting  of  the  five  nodes  listed  in 
Table  2,  attempts  to  estimate  the  position  and  velocity 
of  a  target  in  2D.  Each  sensor  has  its  own  detection 
range  and  sensor  error  characteristics.  Nodes  1,  2  and 

2  Both  of  these  methods  depend  on  the  observation  that 

c  =  J ■p“(x)p6(1_“)(x)rfx  =  J pa(*)  dx 

In  other  words,  the  expectation  can  be  taken  with  respect  to 
Pa(x).  Since  this  probability  distribution  is  known  it  does  not 
have  to  be  approximated. 


Node 

Position 

Velocity 

Type 

Range 

Uncertainty 

1 

(0,0) 

(0,0) 

Bearing 

2000 

0.5° 

2 

(100,0) 

(-1,0) 

Bearing 

2000 

2° 

3 

(0,1000) 

(0,-2) 

Bearing 

200 

1° 

4 

(0,-1000) 

(0,0) 

Range 

800 

10 

5 

(0,1000) 

(0,0) 

Range 

1200 

10 

Table  2:  The  location,  velocity,  type  detection  range 
and  accuracy  for  the  sensors  used  in  the  example. 


Figure  2:  The  GMMs  generated  by  the  different  sensor 
types.  The  location  of  the  target  is  given  by  *  and  the 
location  of  the  node  by  o.  Sensor  measurements  are 
the  dashed  lines.  For  each  component  of  the  GMM, 
the  mean  is  shown  as  the  +  and  the  3cr  covariance 
ellipse  is  shown  as  the  solid  line. 


3  measure  the  bearing  to  the  target  and  nodes  4  and  5 
measure  the  range.  Therefore,  the  state  of  the  target 
is  not  observable  from  any  single  node. 

Each  node  takes  a  measurement  once  per  second  and 
the  probability  of  detection  is  1.  When  the  measure¬ 
ments  are  first  received,  each  node  initializes  a  GMM 
to  represent  the  available  information.  The  initialized 
results  are  shown  in  Figure  2.  The  bearing-only  sensor 
uses  the  range  parameterized  Kalman  filter  proposed 
by  Peach  [9].  Peach  observed  that,  for  bearings  only 
tracking,  a  Kalman  filter  using  modified  polar  Carte¬ 
sian  coordinates  is  consistent  providing 


A  bank  of  filters  are  initialized,  each  with  the  same 
weight,  and  each  using  the  bearing  estimate  and  a  nom¬ 
inal  range  to  initialize  the  track  position.  In  our  exper¬ 
iments,  we  found  that  if  we  used  the  Unscented  Trans¬ 
formation  [22]  and  Cartesian  coordinates,  we  could  in¬ 
crease  the  above  inequality  to  0.5.  To  initialize  the 
range-only  estimate  we  used  an  angle  parameterized 
Kalman  filter  [8].  This  is  conceptually  very  similar  to 
the  range-parameterized  filter:  a  set  of  hypotheses  are 
generated  for  different  nominal  values  of  the  bearing. 
The  range  data  was  extracted  from  the  sensor  and  the 
bearing  covariance  was  set  to  be  sufficiently  large  to  en¬ 
sure  that  adjacent  covariance  ellipses  overlapped.  Ex¬ 
periments  indicated  that  a  45°  degree  spacing,  leading 
to  8  modes,  was  sufficient  to  provide  stable  tracking. 

Once  a  sensor  had  initialized  a  target,  each  mode 
was  predicted  and  updated  locally  using  the  standard 
GMM  update  rules  [23].  After  every  10  time  steps  the 
nodes  compressed  their  estimates  for  distribution  and 
broadcast  their  estimates  to  other  nodes.  The  com¬ 
pression  step  merged  the  GMM  estimates  in  each  mode 


into  a  mixture  of  four  components.  This  step  was 
carried  out  for  two  reasons.  The  first  was  to  reduce 
the  number  of  parameters  which  must  be  distributed 
between  the  nodes.  Second,  by  reducing  the  num¬ 
ber  of  modes,  the  computational  cost  of  the  update 
algorithm  is  greatly  reduced.  We  used  the  Integral 
Squared  Error  Reduction  algorithm  (ISER)  developed 
by  Williams  and  Maybeck  [24].  This  algorithm  uses  a 
greedy  approach  to  merge  components  such  that  the 
integral  squared  distance  between  the  original  distri¬ 
bution,  P(x)  and  the  approximate  distribution  P(x), 

I  (P(x)-P(x))2dx 

is  minimized. 

Each  node  broadcast  its  state  estimate  to  all  other 
nodes.  The  probability  that  an  update  was  received 
was  70%.  Furthermore,  no  acknowledgment  scheme 
was  used  and  so  no  node  knows  if  an  estimate  received 
its  communication.  Therefore,  the  topology  of  the  net¬ 
work  is,  in  effect,  adhoc  and  time  varying. 

If  a  node  received  a  broadcast  estimate,  it  fused  that 
estimate  using  one  of  the  update  schemes  into  its  lo¬ 
cal  estimate  and  the  number  of  components  were  re¬ 
duced  back  to  4  (for  nodes  1-3)  or  8  (for  nodes  4-5) 
using  ISER  to  prevent  the  combinatorial  explosion  in 
the  number  of  terms  in  the  GMM. 

Three  algorithms  were  tested: 

1.  Naive  Bayes.  This  assumes  that  the  estimates 
are  independent.  The  standard  GMM  equations 
are  used  to  fuse  local  and  remote  estimates. 

2.  PCCI  This  uses  the  PCCI  equations  in  (7)  with 
the  cost  function  to  minimize  the  determinant  of 
each  Cij. 

3.  Pseudo-Chernoff.  This  uses  the  pseudo- 
Chernoff  equations  in  (9)  with  the  cost  to  mini¬ 
mize  C  in  (10). 

Each  algorithm  has  qualitatively  different  results  as 
illustrated  in  Figure  3.  This  figure  plots  the  com¬ 
ponents  of  each  estimate  for  each  node  at  time  step 
31.  At  this  time  step  the  target  has  been  detected  by 
nodes  1,  2  and  5  and  the  nodes  have  just  completed 
a  distributed  update.  In  this  instance,  all  nodes  re¬ 
ceived  updates  from  all  other  nodes.  Because  both 
range  and  bearing  data  are  being  fused  together,  the 
results  should  lead  to  an  estimate  which  is  tightly  clus¬ 
tered  about  the  intersection  of  the  range  and  bearing 
measurements.  However,  the  components  of  the  naive 
Bayes  estimate,  shown  in  Figure  3(a),  are  scattered 
around  the  intersection  region.  The  PCCI  algorithm, 
on  the  other  hand,  scatters  its  components  much  more 
widely.  As  can  be  seen  in  Figure  3(b),  the  estimates 
lie  in  two  main  clusters.  The  first  cluster  lies  to  the 
left  and  is  near  the  intersection  between  the  range  and 
bearing  estimates.  The  second  cluster  lies  to  the  right 
and  is,  in  fact,  behind  the  range  sensor.  The  results 
from  the  pseudo-Chernoff  algorithm  are  shown  in  Fig¬ 
ure  3(c)  and,  as  can  be  seen,  all  the  components  lie  at 
the  intersection  region. 


1000 


-500 


-1000  -  O 


- i - i - i - i - i — 

-1000  -500  0  500  1000 

(a)  Naive  Bayes. 
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Figure  4:  The  mean  squared  errors  in  the  estimates. 


Figure  3:  The  state  of  the  estimate  at  time  step  30. 
The  location  of  the  true  target  is  *  (most  clearly  seen  in 
the  first  figure).  The  locations  of  the  sensing  platforms 
are  o.  The  measurements  taken  by  the  bearing  sensors 
are  shown  as  lines,  those  for  the  range  circles  as  sensors. 
The  4cr  covariance  ellipses  for  each  component  in  each 
node  estimate  is  plotted. 


The  above  results  illustrate,  not  surprisingly,  that 
the  naive  Bayes  algorithm  leads  to  an  extremely  poor 
approximation  to  the  state  estimate.  Furthermore,  it 
could  be  argued  that  the  PCCI  is  more  conservative: 
rather  than  place  all  of  the  components  in  one  location, 
it  tends  to  distribute  them  more.  These  results  are 
partially  confirmed  by  Figure  4  which  plots  the  mean 
squared  errors  in  each  algorithm.  As  can  be  seen,  the 
MSE  in  the  naive  Bayes  and  PCCI  are  similar  to  one 
another.  The  MSE  in  the  pseudo- Chernoff  algorithm 
is  significantly  smaller. 

However,  the  apparently  conservative  nature  of  the 
PCCI  algorithm  is  not  evident  in  Figure  5.  This  fig¬ 
ure  plots  the  actual  mean  squared  error  in  x  versus 
the  mean  standard  deviation  (calculated  from  the  co- 
variance  matrix)  for  100  Monte  Carlo  runs.  As  can  be 
seen,  the  errors  in  the  PCCI  algorithm  show  regular 
spikes  due  to  the  distributed  updates.  In  a  number 
of  instances,  the  true  mean  squared  error  is  greater 
than  that  calculated  by  the  filter.  In  contrast,  the 
mean  squared  error  in  the  Pseudo-Chernoff  algorithm 
actually  falls  at  each  update  step  and  the  true  mean 
squared  error  is  less  than  that  estimated  by  the  filter. 

5  Conclusions 

This  paper  has  conducted  an  empirical  study  into  the 
use  of  Chernoff  information  to  provide  robust  algo¬ 
rithms  for  the  fusion  of  GMMs  with  unmodified  corre¬ 
lations  in  distributed  environments.  We  have  derived  a 
first  order  approximation  which  we  have  shown  is  con¬ 
sistent  and  more  accurate  than  the  PCCI.  These  results 
provide  additional  evidence  that  Chernoff  information 
provides  a  potentially  valuable  extension  of  Cl  to  more 
general  classes  of  probabilistic  distributions. 

There  are  several  issues  to  be  addressed.  First  and 
foremost,  it  is  still  unclear  what  properties  are  actually 
guaranteed  by  the  Chernoff  Information.  The  Cl  algo¬ 
rithm  has  the  property  that,  providing  the  conditions 
in  (1)  are  satisfied,  then  (2)  is  satisfied  as  well.  How¬ 
ever,  it  is  not  clear  if  an  equivalent  condition  can  be 
specified  in  the  input  and  output  distributions.  Sec¬ 
ond,  a  more  detailed  analysis  of  the  effects  of  the  first- 
order  approximation  must  be  carried  out.  A  higher 
order  expansion  will,  for  example,  lead  to  a  more  accu¬ 
rate  estimate.  Third,  experiments  should  be  conducted 
to  explore  the  effect  of  using  different,  and  potentially 
more  accurate,  cost  functions  on  the  estimate. 
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Figure  5:  The  log  mean  squared  error  estimates  and 
covariances  in  the  x  estimates  for  filter  1,  calculated 
from  100  Monte  Carlo  runs. 
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